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SUMMARY 


At  the  request  of  the  Air  Force  Manpower  and  Personnel  Center,  the  Air  Force  Human  Resources 
Laboratory  (AFHRL)  conducted  research  to  develop  a  methodology  for  establishing  aptitude 
requirements  for  enlistee  occupations.  The  resulting  methodology  produces  job-centered  measures 
of  occupational  learning  difficulty  representing  the  time  required  to  learn  to  perform  an 
occupation  satisfactorily.  These  measures  were  used  as  an  empirical  basis  for  establishing 
aptitude  requirements  published  In  the  Airman  Classification  Regulation.  It  has  also  been 
applied  to  the  Air  Force  person- job  match  system  for  determining  enlistee  job  assignments.  In 
order  to  transfer  this  technology  from  a  research  to  an  operational  setting.  It  was  necessary  to 
Investigate  the  feasibility  of  having  Air  Force  personnel,  rather  than  research  personnel, 
collect  the  data  for  use  in  deriving  learning  difficulty.  This  paper  summarizes  a  study 
conducted  by  AFHRL  to  assess  the  feasibility  of  having  Air  Force  personnel  from  the  Air  Force 
Occupational  Measurement  Center  collect  these  data  to  support  the  operational  implementation  of 
the  benchmark  learning  difficulty  technology. 
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BENCHMARK  LEARNING  DIFFICULTY  TECHNOLOGY: 
FEASIBILITY  OF  OPERATIONAL  IMPLEMENTATION 


I.  INTRODUCTION 

Occupational  research  Is  an  important  component  of  sound  organizational  management.  Although 
the  value  of  such  research  for  training  management  has  been  well  recognized,  tne  use  of 
occupational  data  In  the  selection  and  classification  of  personnel  Is  a  relatively  recent 
development . 

In  ’973,  the  Air  Force  Manpower  and  Personnel  Center  (AFMPC)  requested  that  the  Air  Force 
Human  Resources  Laboratory  (AFHRL)  conduct  research  to  develop  an  objective  procedure  for 
establishing  relative  aptitude  requirements  for  enlistee  occupations.  After  10  years  of 
extensive  research,  the  Laboratory  developed  a  state-of-the-art  technology  for  this  purpose.  The 
technology  produces  job-centered  measures  of  occupational  learning  difficulty.  Occupational 
learning  difficulty  Is  operationally  defined  as  the  time  It  takes  to  learn  to  perform  an 
occupation  satisfactorily  (Mead,  1970a,  1970b;  Head  i  Chrlstal,  1970;  A  Lecznar,  1971). 

The  use  of  learning  difficulty  data  for  establishing  relative  aptitude  requirements  Is  based 
on  the  assumption  that  aptitude  ( 1 . e . ,  aptitude  mlnlmums)  and  learning  time  (i.e.,  task  learning 
difficulty)  are  related.  Considerable  research  has  demonstrated  a  high  correlation  between  task 
aptitude  requirement  ratings  from  behavioral  scientists  and  task  difficulty  ratings  from  senior 
Air  Force  supervisors  {Fuglll,  1972).  Additional  research  by  Block  and  Anderson  (1975),  Cronbach 
and  Snow  (1977),  Gettinger  and  White  (1979)  and  Krumboltz  (1965)  lend  further  support  to  the 
notion  that  aptitude  Is  related  to  learning  time. 

In  deriving  measures  of  occupational  learning  difficulty,  three  types  of  occupational 
Information  were  employed:  (a)  task  time-spent  ratings  provided  by  Incumbents,  (b)  supervisory 
ratings  of  task  difficulty— both  available  from  the  Air  Force  Occupational  Measurement  Center 
(OMC),  and  (c)  benchmark  ratings  of  task  learning  difficulty  obtained  through  evaluations  by 
contract  personnel.  Benchmark  ratings  were  necessary  because  supervisory  ratings  of  task 
difficulty  only  provided  Information  concerning  the  relative  order  of  tasks  within  occupations. 
Consequently  supervisory  ratings  were  not  comparable  across  occupations.  On  the  other  hand, 
benchmark  ratings  of  task  learning  difficulty  which  are  based  on  task-anchored  benchmark  rating 
scales  (Burtch,  Lipscomb,  S  Wlssman,  1982)  are  comparable  across  occupations.  The  benchmark 
rating  scales  were  deslqned  to  capture  the  range  of  learning  difficulty  characteristic  of  all 
tasks  within  occupations  In  a  given  aptitude  area. 


Benchmark  Procedure 


Benchmark  ratings  were  collected  for  the  ultimate  purpose  of  adjusting  supervisors'  task 
ratings  so  they  would  be  comparable  across  occupations.  To  date,  occupational  learning 
difficulty  measures  have  been  derived  for  more  than  200  enlisted  Air  Force  Specialties  (AFSs), 
representing  approximately  100,000  tasks  and  170,000  enlisted  positions. 

The  procedure  followed  by  the  contractor  to  collect  benchmark  learning  difficulty  ratings  for 
a  particular  specialty  consisted  of  the  following  steps.  First,  60  tasks  from  the  associated 
Job/task  Inventory  were  selected  based  on  the  following  criteria: 

1.  Non-supervisory:  tasks  performed  solely  by  supervisors  were  eliminated. 

2.  Captured  the  range  of  difficulty:  tasks  were  selected  to  represent  the  range  of  super¬ 
visor  relative  ratings  of  learning  difficulty. 


3.  High  rater  agreement  on  the  difficulty  of  the  task:  tasks  were  selected  for  which  there 
was  high  agreement  among  raters  when  judging  learning  difficulty. 

4.  Performed  by  first-term  (1  to  46  months)  personnel:  tasks  were  selected  that  were 

performed  by  persons  in  their  first  term  of  enlistment. 

5.  Frequently  performed:  tasks  that  were  most  frequently  performed  were  selected. 

6.  Easily  observed:  tasks  that  could  be  easily  observed  were  selected. 

7.  High  face  validity:  tasks  were  selected  that  appeared  valid  for  representing  the  range 
of  learning  difficulty  associated  with  the  occupation. 

Th  content  validity  of  each  of  these  tasks  was  then  confirmed  with  technical  school  instructors. 

Second,  a  group  of  14  raters,  each  having  expert  knowledge  of  the  tasks  to  be  rated,  visited 
operational  sites  to  Interview  job  Incumbents  to  gather  information  regarding  performance  of  each 
of  the  60  tasks.  The  raters  were  divided  Into  two  teams  of  seven  members  each,  and  each  team 
visited  a  different  operational  site  to  obtain  Information  for  the  same  set  of  tasks.  Third, 
after  becoming  familiar  with  the  tasks  to  be  rated,  each  team  member  Independently  provided 
ratings  of  task  learning  difficulty  using  a  benchmark  rating  scale.  Briefly,  a  benchmark  rating 
scale  Is  a  25-point  scale  with  each  point  anchored  or  defined  by  two  tasks  of  equivalent  learning 
difficulty  (Burtch,  Lipscomb,  i  Wlssman,  1982).  Fourth,  the  ratings  were  averaged  across  all 
raters.  Fifth,  the  average  benchmark  difficulty  ratings  were  used  to  adjust  average  task  ratings 
of  relative  learning  difficulty  supplied  by  occupational  supervisors.  These  adjusted  task 
ratings  are  referred  to  In  this  paper  as  benchmark  learning  difficulty  estimates.  Unlike 
supervisory  ratings,  the  difficulty  estimates  are  comparable  across  occupations.  These  benchmark 
learn, ng  difficulty  estimates  for  each  task  were  then  multiplied  by  the  time  spent  performing 
each  task  for  each  occupational  Incumbent,  and  the  products  were  summed  across  tasks  and  divided 
by  a  factor  of  10  to  produce  an  Index  referred  to  as  an  Average  Task  Difficulty  Per  Unit  Time 
Spent  (ATDPUTS).  Although  the  ATDPUTS  are  computed  on  a  posltlon-by-posltlon  basis,  they  are 
averaged  to  represent  occupational  learning  difficulty  for  specified  Incumbent  groups. 

Once  the  estimates  of  occupational  learning  difficulty  were  available,  they  were  used  for  two 
separate  organizational  purposes.  First,  they  were  used  In  combination  with  training  and 
recruiting  Information  for  establishing  aptitude  requirement  mlnlmums  stated  In  AFR  39-1,  Airman 
Classification  Regulation,  Second,  as  described  by  Keeks  (1984),  these  measures  were  used  in  the 
Air  Force  person-job  match  algorithms  for  determining  Individual  assignments  to  specialties. 
Since  measures  of  occupational  learning  difficulty  will  be  used  for  these  purposes  In  the  future. 
It  Is  Important  that  the  associated  measurement  procedure  be  transferred  to  an  operational 
organization  for  routine  Implementation. 

Objective  of  the  Present  Study 

To  transfer  the  learning  difficulty  measurement  technology  from  a  research  to  an  operational 
setting.  It  was  necessary  to  Investigate  the  possibility  of  having  benchmark  ratings  routinely 
collected  by  Air  Force  personnel  rather  than  by  contract  Job  analysts.  Hence,  AFHRL,  In 
coordination  with  the  OHC,  conducted  a  feasibility  study.  The  study  was  designed  to  assess  the 
feasibility  of  having  Air  Force  personnel  collect  benchmark  ratings  for  use  In  deriving  measures 
of  occupational  learning  difficulty.  This  paper  describes  the  approach  taken  and  results 
obtained  In  addressing  this  research  Issue. 


II.  METHOD 


Selection  of  Specialties 


Nine  Air  Force  enlisted  personnel  specialties  were  selected  for  assessment: 
Air  Force  Specialty 

Code  (AFSC)  _ Specialty _ 


251X0 

Weather 

272X0 

Air  Traffic  Control 

316X3 

Instrumentation 

325X0 

Automatic  Flight  Control  Systems 

325X1 

Avionics  Instrument  Systems 

426X2 

Jet  Engine  Mechanic 

426X3 

Turboprop-Propulslon  Mechanic 

431X1 

Tactical  Aircraft  Maintenance 

431X2 

Airlift/Bombardment  Aircraft  Maintenance 

The  selection  of  these  AFSs  was  based  on  the  following  criteria:  first,  benchmark  learning 
difficulty  ratings  collected  by  contractor  personnel  were  available,  and  second,  the  specialties 
were  representative  of  the  Mechanical,  Electronic,  and  General  aptitude  areas. 


Development  of  Task  Lists 


For  each  specialty,  the  representative  sample  of  tasks  originally  used  by  the  contractor  was 
obtained.  Each  sample  consisted  of  60  tasks  selected  from  Job  inventories  containing  between  518 
and  1124  tasks.  The  60  tasks  were  those  originally  selected  during  the  initial  contract  effort. 

Selection  of  Raters 


Once  the  selection  of  specialties  and  development  of  task  lists  had  been  accomplished,  the 
next  step  was  to  select  Air  Force  personnel  to  collect  benchmark  task  difficulty  ratings  for  each 
of  the  nine  specialties.  For  this  purpose,  OMC  provided  eight  Inventory  developers  and  Job 
analyses  to  serve  as  raters.  OMC  personnel  were  especially  suited  to  participate  as  raters 
because  of  their  expertise  in  collecting  and  analyzing  job  and  task  data.  The  OMC  raters 
consisted  of  four  military  and  four  civilian  personnel.  The  selected  personnel  had  an  average  of 
5  years  experience  In  the  areas  of  Air  Force  Inventory  development  and  occupational  analysis. 


Training  of  Raters 


lo  familiarize  the  OMC  personnel  with  the  specialized  procedures  used  In  collecting  benchmark 
data,  a  3*day  training  workshop  was  conducted  at  AFHRL.  Training  consisted  of  instruction  and 
exercises  on  procedures  for  analyzing  tasks  to  determine  learning  difficulty.  An  Important 
aspect  of  the  training  dealt  with  the  use  of  the  benchmark  rating  scales  and  how  ratings  are 
assigned.  Training  also  focused  on  the  conduct  of  Interviews,  the  principal  method  for  gathering 
information  on  the  tasks  for  which  ratings  of  learning  difficulty  are  produced.  Procedural 
guides  for  using  the  benchmark  scales  were  prime  training  materials.  The  guides  Included 
definitions  of  assessment  criteria  and  descriptions  of  each  task  anchor  on  each  benchmark  scale. 
Attention  was  placed  on  the  use  of  eight  task  assessment  criteria  as  guidelines  In  determining 
exactly  what  each  task  is,  how  it  Is  performed,  and  what  skills  or  knowledge  are  required  to 
perform  It  adequately.  The  eight  assessment  criteria  are  (a)  task  definition,  (b)  number  of 
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steps  In  a  task,  (c)  tools  and  equipment  unique  to  the  task,  (d)  regulations,  manuals,  and 
standard  operating  procedures,  (e)  memorization,  (f)  standards  of  performance,  (g)  time 
criticality,  and  (h)  basic  skills  or  knowledge.  The  course  objectives  for  the  training  are 
provided  In  Appendix  A.  Definitions  of  each  of  the  criteria  are  provided  In  Appendix  B. 

Upon  completion  of  the  3-day  training  workshop,  the  eight  OMC  raters  were  divided  Into  two 
teams  of  four  members  each.  The  two  teams  then  conducted  practice  sessions  In  the  field, 
accompanied  by  an  AFHRL  Instructor,  to  apply  the  various  techniques  that  had  been  learned  for  use 
In  deriving  benchmark  task  learning  difficulty  ratings.  In  addition,  these  practice  sessions 
were  conducted  to  assess  the  effectiveness  of  the  training  and  to  determine  any  problem  areas 
where  more  training  might  be  warranted. 

The  specialties  studied  during  the  practice  sessions  were  Air  Force  Specialty  Code  (AFSC) 
811X0,  Security  Police,  and  AFSC  341X5,  Analog  Navigation/Tactics  Training  Devices.  These 
specialties  were  selected  due  to  the  availability  of  personnel  In  the  local  area.  Both  practice 
sessions  were  conducted  at  Randolph  AFB. 

Each  team  Independently  Interviewed  a  3— sk 1 1 1 - 1  eve  1  and  a  5— sk 1 1 1-level  airman  from  AFSC 
811X0  and  AFSC  341X5.  Each  Interview  session  lasted  about  4  hours.  During  the  sessions,  OMC 
team  members  queried  the  airmen  regarding  the  performance  of  each  of  the  60  tasks.  Questions 
were  directed  toward  task  performance  relative  to  the  eight  assessment  criteria  previously  listed. 

After  each  task  on  the  list  had  been  addressed  and  team  members  were  confident  that  they  had 
acquired  enough  Information  about  the  performance  of  each  task,  the  session  was  terminated.  Each 
team  member  Independently  rated  the  learning  difficulty  of  each  task  using  the  appropriate 
benchmark  task  difficulty  rating  scale.  Individual  ratings  for  each  task  were  then  summed  and 
divided  by  four  to  derive  an  average  team  rating  for  each  task. 

For  ratings  collected  during  the  practice  sessions,  acceptable  levels  of  Interrater  agreement 
were  reached  for  both  team  1  and  team  2  ( R 1 1 - . 72  for  team  1  and  .80  for  team  2).  In  addition, 
both  teams  agreed  that  training  had  been  effective  and  that  the  two  practice  sessions  were 
helpful  In  developing  an  understanding  of  the  rating  procedures. 


Field  Study 

Once  OMC  personnel  had  been  trained  In  the  benchmark  rating  procedure,  each  team  visited 
different  operational  sites  to  Interview  occupational  Incumbents  from  each  of  nine  enlisted 
specialties.  On  the  basis  of  Information  gathered  during  these  Interviews,  each  team  member 
Independently  rated  each  of  the  60  tasks  for  each  specialty  In  terms  of  learning  difficulty. 
Data  collected  during  this  phase  were  to  be  used  In  the  assessment  of  the  feasibility  of 
operational  Implementation  of  the  learning  difficulty  technology.  Operational  site  visits  were 
made  to  Randolph,  Klrtland,  Holloman,  Davls-Monthan,  and  Barksdale  AFBs  during  a  3-week  period. 
A  listing  of  the  specialties  studied  by  base  and  by  team  Is  provided  In  Appendix  C. 

At  each  site,  two  or  more  airmen  from  each  specialty  were  Interviewed  as  a  group.  Each 
Interview  session  lasted  approximately  4  hours.  After  Interviews,  each  team  member  Independently 
assigned  a  benchmark  rating  to  each  task.  Independent  ratings  were  then  averaged  across  raters 
within  each  team  to  yield  the  average  benchmark  difficulty  rating  for  each  task  rated.  In 
addition,  ratings  of  task  difficulty  were  averaged  across  team  1  and  team  2  members  to  provide 
average  task  ratings  for  the  total  team.  This  was  done  for  each  of  the  nine  specialties  studied. 

Upon  completion  of  the  operational  site  visits,  OMC  teams  forwarded  their  ratings  to  AFHRL 
for  data  processing  and  analyses.  In  analyzing  the  data  from  the  OMC  field  study,  data 
originally  collectr-l  by  the  contractor  were  used  for  comparative  purposes. 


III.  ANALYSIS 


There  were  three  major  goals  of  the  analyses.  The  first  goal  was  to  determine  the  Internal 
consistency  of  the  /5-point  benchmark  task  difficulty  ratings  collected  by  OMC  team  members. 
Pescr'ptive  statistics  on  the  ra'ings  were  obtained  for  OMC  teams  1  and  2  and  for  contractor 
teams  1  and  2.  Reliability  of  ratings  was  assessed  using  the  CODAP  program  REXALL  (Chrlstal  & 
Weissmuller,  1976),  which  computes  an  Index  of  Interrater  agreement.  Reliability  Indices  were 
derived  tor  OMC  team  1,  team  2,  and  total  team  and  for  contractor  team  1,  team  2,  and  total 
team.  In  addition,  intercorrelations  for  OMC  team  1  vs.  team  2  and  contractor  team  1  vs.  team  2 
were  computed. 

The  second  goal  of  the  analyses  was  to  determine  the  validity  of  the  OMC  benchmark  ratings. 
Validity  was  assessed  first  by  comparing  the  character Istlc  response  patterns  of  the  OMC  total 
team  and  contractor  total  team  by  specialty  and  second  by  comparing  the  relationship  between  OMC 
teams'  and  supervisors'  ratings  of  task  difficulty  and  contractor  teams'  and  supervisors'  ratings 
of  task  difficulty.  Relationships  among  benchmark  ratings  were  analyzed  using  TRICOR,  a  general 
purpose  correlation  program  (Black,  1978). 

The  third  goal  of  the  analysis  was  to  compare  measures  of  occupational  learning  difficulty 
( ATDPUTS )  generated  from  OMC  benchmark  data  with  those  generated  by  the  contractor.  ATDPUTS  were 
generated  for  each  of  the  nine  specialties  based  on  OMC  total  team  benchmark  ratings. 
Specialties  were  then  rank  ordered  (from  highest  to  lowest)  In  terms  of  their  associated 
ATDPUTS.  The  ranking  of  specialties  was  then  compared  to  the  ranking  of  specialties  based  on 
contractor-derived  ATDPUTS,  to  determine  the  consistency  of  rankings  between  the  two  teams. 


IV.  RESULTS 


Rel lability 


A  comparison  of  OMC  teams  1  and  2  In  terms  of  average  ratings  of  task  learning  difficulty  Is 
presented  In  Table  1,  together  with  the  associated  standard  deviations  by  specialty.  Seven  of 
the  nine  specialties  were  found  to  be  roughly  equivalent,  falling  around  the  mid-point  of  the 
scale.  Significant  differences  were  found,  however,  between  teams  for  AFSC  251X0,  Weather,  U 
(  102)  =  4.69,  p^  <  .05)  and  AFSC  426X2,  Jet  Engine  Mechanic,  (_t  (94)  *  2.42,  £  <  .05). 
Differences  were  attributable  to  the  fact  that  team  1  tended  to  rate  tasks  higher  In  learning 
difficulty  than  did  team  2  for  AFSC  251X0;  whereas,  team  1  tended  to  rate  tasks  lower  than  did 
team  2  for  AFSC  426X2. 


Table  1.  Average  Task  Difficulty  Ratings 
for  OMC  Team  7  and  Team  2  Personnel 


AFSC 

OMC  Team  1 

(N  *  4) 

OMC  Team  2 

(N  -  4) 

X 

SD 

X 

SD 

251X0* 

15.0 

3.23 

11.9 

3.77 

272X0 

14.4 

2.87 

14.1 

2.84 

316X3 

13.8 

3.26 

13.9 

3.86 

325X0 

13.9 

2.79 

13.1 

3.19 

325X1 

13.2 

2.08 

13.5 

3.42 

426X2* 

11.9 

2.34 

13.5 

3.98 

426X3 

13.6 

2.39 

13.3 

3.75 

431X1 

12.6 

1.91 

13.4 

2.50 

431X2 

12.8 

1.94 

12.6 

2.54 

*An  asterisk  denotes  a  significant  difference  (p  <  .05). 
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T-CT 


CVTTT 
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Descriptive  statistics  for  contractor  teams  1  and  2  are  provided  In  Appendix  D.  For  eight  of 
the  nine  specialties,  no  significant  differences  were  found  between  teams.  In  the  average  ratings 
of  difficulty.  A  significant  difference  was  found  for  AFSC  251X0,  Weather.  For  this  specialty, 
contractor  team  1,  on  the  average,  rated  tasks  lower  than  did  team  2. 

Estimates  of  Interrater  reliability  (R^)  for  rating  groups  were  obtained  for  OHC  total 
team  (k  =  8),  team  1  (k  *  4),  and  team  2  (k  *  4)  and  for  contractor  total  team  (k  *  14),  team  1 
(k  «  7),  and  team  2  ( k  =  7).  Reported  Rkk  values  for  the  contractor  teams  were  based  on  an 
of  8  for  the  contractor  total  team  and  an  N  of  4  for  each  of  the  two  contractor  teams.  This 
adjustment  was  performed  using  the  Spearman-Brown  formula  to  render  contractor  team  data  more 
comparable  to  OMC  team  data.  Across  the  nine  specialties  studied,  the  range  of  Rj^  values  for 
the  OMC  total  team  was  .81  to  .95,  and  the  median  (Mdn)  was  .90.  The  ranges  of  R^  values  for 
OMC  team  1  and  team  2  were  .62  to  .89  (Mdn  =  .81)  and  .74  to  .94  (Mdn  *  .93),  respectively.  For 
the  contractor  total  team,  R^  values  ranged  from  .87  to  .95  (Mdn  *  .94)  across  the  nine 
specialties  while  values  ranged  from  .62  to  .89  (Mdn  =  .81)  for  contractor  team  1  and  .89  to  .94 
(Mdn  =  .93)  for  contractor  team  2. 

Intercorrelations  among  ratings  for  OMC  team  1  and  team  2  and  contractor  team  1  and  team  2 
were  computed  as  a  further  check  of  the  internal  consistency  of  team  ratings.  Correlation 
coefficients  for  OMC  team  1  vs.  team  2  and  contractor  team  1  vs.  team  2  for  each  of  the  nine 
specialties  are  presented  In  Table  2.  The  significance  of  each  correlation  coefficient  was 
evaluated  using  a  critical-ratio  z  test.  All  correlations  Indicated  significant  agreement  (j>  < 
.05)  within  rating  groups. 

Table  2.  Correlations  Between  OMC  Team  1  and  Team  2 
and  Contractor  Team  1  and  Team  2 


AFSC 

OHC  Team  1 
vs.  Team  2 

Contractor  Team  1 

vs.  Team  2 

251X0 

.81 

.79 

272X0 

.62 

.72 

316X3 

.45 

.84 

325X0 

.86 

.81 

325X1 

.74 

.78 

426X2 

.78 

.80 

426X3 

.80 

.84 

431X1 

.43 

.81 

431X2 

.62 

.61 

Mote.  All  correlations  are  significant  (p  <  .05). 


Validity 


The  means  and  standard  deviations  for  OMC  and  contractor  total  teams  across  each  of  the  nine 
specialties  are  presented  In  Table  3.  Contractor  teams  originally  rated  60  tasks  for  each 
specialty.  For  purposes  of  analysis,  however,  only  those  tasks  rated  by  OMC  teams  were 
considered  In  analyzing  contractor  data.  These  task  subsets  included  from  43  to  55  of  the 
original  60  tasks.  In  general,  the  average  task  difficulty  ratings  by  OMC  and  contractor 
personnel  were  roughly  equivalent,  falling  around  the  mid-point  of  the  25-point  benchmark 
scales.  Significant  differences  In  mean  values  between  teams  were  found,  however,  for  AFSC 
251X0,  Weather,  (£  (  102)  *  3.28,  j>  <  .05);  272X0,  Air  Traffic  Control,  (_t  (  108)  =  5.62,  p  <  .05); 
and  431X2,  Airlift/Bombardment  Aircraft  Maintenance,  (t  (106)  =  2.31,  p  <  .05). 


v. 


Table  3.  Average  Task  Difficulty  Ratings 
by  OMC  and  Contractor  Personnel 


AFSC 

X 

SD 

X 

SD 

251X0* 

13.5 

3.34 

11.3 

3.55 

272X0* 

14.2 

2.56 

11.5 

2.50 

316X3 

13.8 

3.03 

13.1 

2.94 

325X0 

13.5 

2.88 

13.9 

2.94 

325X1 

13.3 

2.58 

13.0 

2.58 

426X2 

12.7 

2.98 

13.3 

3.19 

426X3 

13.5 

2.88 

13.3 

2.93 

431X1 

13.0 

1.87 

13.1 

2.26 

431X2* 

12.7 

2.03 

13.6 

1.90 

*An  asterisk  denotes  a  significant  difference  (p  <  .05)  between  OMC 
and  contractors  in  terms  of  their  average  ratings. 


Intercorre lat ions  among  ratings  were  computed  for  each  specialty  for  OMC  total  team,  team  1, 
and  team  2;  contractor  total  team,  team  1,  and  team  2;  and  supervisors.  Of  primary  Interest  were 
the  correlations  between  OMC  and  supervisors,  contractors  and  supervisors,  and  OMC  and 
contractors.  Supervisory  ratings  were  used  as  the  criterion  measure.  Correlation  coefficients 
for  each  of  these  three  groups  are  provided  In  Table  4.  Correlations  for  seven  of  the  nine 
specialties  were  sufficiently  high  to  suggest  a  strong  relationship  among  teams'  ratings  of 
learning  difficulty.  Differences  (p  <  .05)  were  found  for  OMC  vs.  supervisors  and  for 
contractors  vs.  supervisors  for  AFSC  426X2,  Jet  Engine  Mechanic;  AFSC  431X1,  Tactical  Aircraft 
Maintenance;  and  AFSC  431X2,  Airl 1ft/8ombardment  Aircraft  Maintenance.  In  the  case  of  426X2,  the 
correlation  between  contractors  and  supervisors  was  somewhat  lower  than  between  OMC  and 
supervisors.  For  431X1  and  431X2,  the  OMC  team  correlated  substantially  lower  with  supervisors 
than  did  the  contractors  with  supervisors.  Full  correlation  matrices  for  each  of  the  nine 
specialties  for  all  seven  rating  groups  are  provided  In  Appendix  D. 


Table  4.  Correlations  Between  OMC  and  Contractor  Ratings 
with  Supervisors 


AFSC 

OMC  vs. 
Supervisors 

Contractors  vs. 
Supervisors 

OMC  vs. 

Contractors 

251X0 

.75 

.83 

.79 

272X0 

.74 

.77 

.86 

316X3 

.71 

.77 

.89 

325X0 

.85 

.89 

.88 

325X1 

.81 

.78 

.88 

426X2* 

.82 

.73 

.91 

426X3 

.78 

.79 

.90 

431X1* 

.59 

.88 

.75 

431X2* 

.54 

.79 

.72 

♦An 

groups 

asterisk 

based  on 

denotes  a  significant  difference  (p  <  .05)  between  rating 
a  Hotteling-t. 

Correlations  between  OMC  tea*  1  vs.  supervisors  and  OMC  team  2  vs.  supervisors  are  presented 
in  Table  5.  For  OMC  tea*  1,  significant  agreement  (£  <  .05}  was  found  for  eight  of  the  nine 
specialties.  Significance  of  r_  was  again  tested  using  the  critical-ratio  z  test.  For  AFSC 
431X1,  Tactical  Aircraft  Maintenance,  the  obtained  £  fell  below  the  .05  level  of  significance  to 
Indicate  a  lack  of  agreement  between  OMC  tea*  1  and  supervisors.  The  correlation  coefficients 
obtained  for  team  2  vs.  supervisors  were  all  found  to  be  significant. 


Table  5.  Correlations  Between  OHC  Tea*  1 
and  Tea*  2  with  Supervisors 


AFSC 

OMC  Team  1  vs. 
Supervisors 

OMC  Team  2  vs. 
Supervisors 

251X0 

.74 

.70 

272X0 

.60 

.74 

316X3 

.66 

.56 

325X0 

.84 

.80 

325X1 

.71 

.79 

426X2 

.82 

.75 

426X3 

.75 

.74 

431X1 

.29 

.68 

431X2 

.36 

.59 

Mote.  All  correlations  are  significant  (££.05)  with  the 
exception  of  OMC  tea*  1  vs.  supervisors  for  431X1,  based  on  a 
critical-ratio  z  test. 


Correlation  coefficients  for  contractor  teams  1  and  2  vs.  supervisors  are  provided  In 
Appendix  D.  For  both  teams,  significant  agreement  (£  £  .05)  was  found  with  supervisors. 

Comparability  of  ATOPUTS 

Average  Task  Difficulty  Per  Unit  Time  Spent  (ATOPUTS)  values  were  co*puted  for  first-term 
airmen  for  each  specialty  based  on  OMC  team  25-point  benchmark  ratings.  Previously  derived 
ATDPUTS  based  on  contractor  total  team  benchmark  ratings  were  examined  for  comparison  purposes. 
ATDPUTS  based  on  OMC  total  tea*  and  contractor  total  team  are  Illustrated  in  Table  6  for 
first-term  airmen  across  the  nine  specialties.  For  each  team,  ATDPUTS  are  ranked  In  descending 
order,  from  highest  to  lowest.  The  ATDPUTS  derived  for  OMC  total  team  was  compared  with  that  for 
the  contractor  total  team.  A  Spearman  rho  correlation  coefficient  was  computed  to  determine  the 
relationship  between  the  two  rankings.  The  value  of  the  Spearman  rho  was  +.3  (ns,  p  £  .05).  For 
seven  of  the  nine  specialties  (AFSCs  316X3,  325X0,  325X1,  431X2,  251X0,  426X3,  and  431X1),  the 
difference  In  rank  was  within  one  or  two  places,  and  In  some  cases,  specialty  ranks  were 
identical.  However,  for  AFSCs  272X0  and  426X2,  rank  differences  were  five  and  seven  ranked 
places,  respectively,  between  OMC  and  contractor  ATDPUTS. 


Table  6.  OHC  and  Contractor  ATDPUTS  for 
First-Tor*  Airmen 


OMC 

CONTRACTOR 

AFSC 

7 

SD 

AFSC 

X 

SD 

272X0* 

132 

6.73 

426X2* 

144 

8.92 

426X3 

132 

6.80 

426X3 

131 

6.51 

316X3 

131 

9.24 

325X0 

130 

9.25 

325X0 

129 

8.89 

325X1 

128 

7.41 

325X1 

129 

6.77 

316X3 

123 

10.01 

426X2* 

128 

10.50 

431X2 

122 

8.44 

431X1 

120 

5.83 

431X1 

112 

10.18 

431X2 

118 

6.54 

272X0* 

106 

7.05 

251X0 

114 

13.49 

251X0 

89 

17.7 

*An  asterisk  denotes  those  AFSCs  where  significant  differences 


in  rankings  were  found  between  the  two  teams. 


V.  DISCUSSION  and  CONCLUSIONS 

The  analysis  reported  herein  assessed  the  feasibility  of  having  Air  Force  personnel  collect 
benchmark  task  learning  difficulty  ratings  for  use  In  deriving  measures  of  occupational  learning 
difficulty.  This  assessment  was  necessary  to  evaluate  the  transferability  of  the  benchmark 
learning  difficulty  technology  from  a  research  to  an  operational  setting. 

Overall,  results  of  the  analysis  were  positive.  That  Is,  they  supported  the  feasibility  of 
having  Air  Force  personnel.  In  particular  OMC  personnel,  collect  benchmark  ratings.  The 
Interrater  reliabilities  based  on  a  median  of  .81  to  .93  across  OMC  rating  groups  showed 
high  agreement  for  each  of  the  specialties  studied.  Reliability  of  OMC  ratings  was  consistent 
with  reliabilities  obtained  for  contractor  ratings.  Hence,  the  reliability  of  OMC  ratings  was 
acceptable. 

For  the  majority  of  specialties,  OMC  average  ratings  were  consistent  with  contractor  average 
ratings.  Significant  differences  were  noted,  however,  for  AFSC  251X0,  Weather;  272X0,  Air 
Traffic  Control;  and  431X2,  Airlift/Bombardment  Aircraft  Maintenance.  These  differences  may  be 
attributed  to  any  number  of  sources;  for  example,  variations  In  procedures  for  collecting  ratings 
( 1 . e . ,  site  differences  and  equipment  differences).  More  study  would  be  helpful  In  answering 
some  of  the  questions  regarding  these  differences.  However,  a  more  practical  approach  would  be 
to  assure  that  prescribed  data  collection  procedures  are  explicitly  and  consistently  followed  in 
the  future.  If  differences  are  noted,  then  It  may  be  necessary  for  rating  teams  to  return  to  the 
field  to  reaccomplish  the  data  collection  phase  In  an  attempt  to  provide  more  consistent  ratings 
across  teams.  Such  was  the  procedure  used  by  the  contractor  teams.  This  study  did  not  allow  for 
reaccomplishment  of  data  collection  given  Its  status  as  a  feasibility  study. 

Correlation  coefficients  between  benchmark  scale  ratings  and  the  criterion  (supervisory 
ratings)  across  OMC  rater  groups  and  Air  Force  specialties  were  generally  high  and  positive. 
Exceptions,  however,  were  found  for  the  AFSC  431X1,  Tactical  Aircraft  Maintenance,  and  AFSC 
431X2,  Alrl  If t/Bombardment  Aircraft  Maintenance,  specialties.  For  AFSC  431X1,  OMC  total  team  and 
team  1  ratings  correlated  substantially  lower  with  supervisory  ratings  In  comparison  to 
contractor  teams.  The  reasons  for  this  were  not  readily  evident.  For  the  Airlift/Bombardment 
Aircraft  Maintenance  specialty  (AFSC  431X2),  OMC  team  ratings  again  correlated  lower  than  did  the 
contractor's  with  supervisory  ratings.  Lower  correlations  could  be  attributable  (a)  to 
Individual  differences  In  the  perception  of  task  difficulty,  (b)  to  the  wide  variation  In 


aircraft  being  maintained  by  the  Incumbents  Interviewed  (e.g.,  C-lJOs  at  Klrtland  and  B-52s  at 
Barksdale),  and  (e)  to  the  level  of  experience  of  airmen  Interviewed.  Differences  In  contractor 
team  ratings  may  be  attributable  to  these  factors  as  well. 

In  general,  ATDPUTS  dertved  on  the  basis  of  the  OMC  total  team  and  contractor  total  team 
benchmark  ratings  were  fairly  consistent.  Exceptions,  however,  were  AFSCs  272X0,  426X2,  and 
251X0.  ATDPUTS  based  on  OMC-collected  ratings  for  AFSC  272X0  were  markedly  higher  than 
contractor  ATDPUTS.  In  fact,  AFSC  272X0  was  ranked  highest  among  the  nine  specialties  studied  by 
OMC;  this  difference  can  be  attributed  to  the  fact  that  OMC  teams,  on  the  average,  rated  tasks 
higher  In  learning  difficulty  than  did  the  contractor  teams  for  this  specialty.  ATDPUTS 
differences  for  AFSC  426X2,  Jet  Engine  Mechanic,  result  from  lower  ratings  being  applied  by  OMC 
team  1  In  comparison  to  OMC  team  2  and  contractor  teams.  This  may  be  explained  by  the 
differences  In  equipment  maintained  at  the  operational  sites  visited.  For  example,  at 
Davls-Monthan  AFB,  jet  engine  mechanics  perform  maintenance  primarily  on  the  A-10  aircraft;  while 
at  Klrtland  AFB,  a  number  of  varied  aircraft  are  maintained  because  It  Is  a  transient  aircraft 
facility.  Given  more  equipment  being  maintained  at  Klrtland,  learning  difficulty  of  tasks  may  be 
perceived  as  being  more  difficult  as  opposed  to  the  difficulty  associated  with  maintaining  only 
one  type  of  aircraft.  Thus,  It  Is  theorized  that  equipment  differences  serve  to  account  for 
ATDPUTS  differences.  For  AFSC  251X0,  a  higher  ATDPUTS  value  for  OMC  in  comparison  to  the 
contractors  can  again  be  attributed  to  OMC  team  1  ratings.  OMC  was  slgnif Icantly  higher  In  their 
ratings  of  learning  difficulty  when  compared  to  the  other  teams. 

The  data  reported  herein  support  the  hypothesis  that  Air  Force  personnel  can  collect 
benchmark  learning  difficulty  ratings.  The  evidence  Indicates  that  these  ratings  are  both 
reliable  and  valid.  These  findings  support  the  transferability  of  the  benchmark  technology.  In 
transferring  this  technology,  however,  attention  must  be  focused  on  some  of  the  problems  and 
Issues  associated  with  the  collection  of  benchmark  task  difficulty  ratings. 

The  primary  Issues  to  be  considered  In  Implementation  are  (a)  the  effects  of  task 
variability,  (b)  the  selection  of  occupational  incumbents  to  Interview,  and  (c)  the  adequacy  of 
task  anchor  definitions.  In  addition,  several  secondary  Issues  need  to  be  examined  to  assure 
proper  utilization  of  the  procedure  for  collecting  benchmark  ratings. 


Task  Variability 


A  key  Issue  that  has  been  noted  during  application  of  the  benchmark  learning  difficulty 
scales  Is  that  of  task  variability.  In  discussing  task  variability.  It  Is  useful  to  consider  how 
a  task  Is  defined.  According  to  Morsh  and  Archer  (  1967),  a  task  Is  defined  as  "a  unit  of  work 
activity  which  forms  a  significant  part  of  a  duty."  In  writing  a  task  statement  for  Inclusion  In 
a  job  Inventory,  It  should  (a)  be  understood  by  job  incumbents,  (b)  be  time  ratable,  (c)  have  a 
beginning  and  an  end,  and  (d)  be  unambiguous.  A  major  constraint  on  task  specificity  is  that 
task  lists  (or  Job  Inventories)  should  take  an  airman  no  more  than  2  hours  to  complete;  hence, 
the  number  of  tasks  should  generally  be  less  than  1,000.  Given  these  constraints,  task 
statements  can  become  broad.  In  particular,  task  statements  become  broad  (a)  when  several 
specialties  are  described  In  a  single  task  Inventory  (e.g.,  AFSC  325X0,  Automatic  Flight  Control 
Systems  and  AFSC  325X1,  Instrumentation),  (b)  when  many  aircraft  are  covered  in  a  single  task 
Inventory  (e.g.,  AFSC  431X1A-Z,  Tactical  Aircraft  Maintenance,  and  AFSC  43JX2A-E,  Z, 
Air  1 Ift/Bombardment  Aircraft  Maintenance),  (c)  when  specialties  have  a  large  amount  of 
nonstandard  equipment  (e.g.,  AFSC  316X3,  Instrumentation;  AFSC  918X0,  Biomedical  Equipment 
Maintenance;  or  AFSC  324X0,  Precision  Measuring  Equipment),  and  (d)  when  specialties  have  large 
amounts  of  equipment  (e.g.,  AFSC  304X4,  Ground  Radio  Communications). 
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Tasks  that  are  too  broad  pose  serious  problems  In  terms  of  getting  accurate  estimates  of 
learning  difficulty.  To  exemplify.  In  the  present  data  collection  effort.  Incumbents  from  AFSC 
325X0,  Automatic  Flight  Control  Systems,  and  AFSC  325X1,  Avionics  Instrument  Systems,  were  asked 
to  explain  the  following  task:  "Adjust  attitude  heading  reference  systems  components."  This 
task  appeared  In  a  single  task  Inventory  for  both  AFSs.  When  this  task  was  explained  by  the  AFSC 
325X0  personnel,  it  was  performed  In-shop  only,  required  special  equipment  (e.g.,  oscilloscope, 
directional  gyro,  three-axis  table),  required  technical  orders,  required  special  field  detachment 
training,  and  averaged  6  to  8  hours  to  perform.  When  described  by  AFSC  325X1  personnel,  the  only 
requirements  for  task  performance  were  the  technical  orders  and  a  screwdriver.  It  Is  possible 
that  since  the  attitude  heading  reference  system  Is  an  Instrumentation  system,  the  procedures 
used  In  the  performance  of  this  task  are  less  complex  for  the  AFSC  325X1,  Avionics  Instrument 
Systems  specialist,  than  for  the  AFSC  325X0,  Automatic  Flight  Control  Systems  specialist.  The 
Information  provided  by  the  AFSCs  325X0  and  XI  personnel  for  the  same  task  statement  can  be 
viewed  as  a  reflection  of  an  actual  difference  In  the  tasks  and  jobs  performed  by  personnel  In 
these  two  specialties.  This  could  present  a  problem  If  not  treated  carefully  because  It  could 
result  In  wide  variations  In  learning  difficulty  estimates  for  the  same  task  In  the  same  Job 
inventory.  While  these  variations  may  point  to  a  problem  with  the  benchmark  methodology.  It 
appears  to  be  more  reflective  of  the  way  inventories  are  constructed  and  analyzed. 

The  decision  to  study  two  or  more  specialties  using  the  same  job  Inventory  Is  based  on  the 
perceived  similarity  In  the  jobs  performed  by  personnel  in  the  specialties  under  study.  This  was 
the  case  with  the  AFSCs  325X0  and  325X1  specialties  for  example.  However,  the  jobs  and  tasks 
were  very  different,  hence  the  differences  In  perceived  learning  difficulty  estimates.  One 
solution  may  be  to  develop  job  Inventories  to  study  each  Air  Force  specialty  separately.  An 
alternative  would  be  to  better  specify  tasks  on  Job  Inventories  when  specialties  are  studied 
together,  or  job  analysts  could  analyze  supervisory  ratings  separately  for  each  specialty. 


Interviewing  of  Occupational  Personnel 


Another  key  Issue  pertains  to  the  recommended  personnel  for  providing  task  Information  about 
specialties.  For  collecting  task  Information  3-level  (apprentice)  or  junior  5-level  (Journeymen) 
airmen  were  Interviewed  to  gather  Information  for  evaluating  tasks  In  terms  of  their  learning 
difficulty.  Based  on  the  experiences  of  the  OMC  rating  teams,  however,  these  personnel  appeared 
to  be  limited  in  knowledge  and  experience  on  the  performance  of  key  tasks,  resulting  In  the 
rating  teams'  Inability  to  rate  those  unaddressed  tasks.  Across  specialties,  between  5  and  17 
tasks  could  not  be  rated  because  little  or  no  Information  was  provided  by  the  Interviewees.  To 
assure  that  a  greater  number  of  tasks  could  be  rated.  It  Is  recommended  that  senior  5-level  and 
7-level  personnel  be  used  during  Interviews  since  their  range  of  knowledge  and  experience  would 
be  greater  than  that  of  junior  personnel. 


Anchor  Task  Definitions 


Focusing  on  the  procedural  guides  (Hart,  1976)  used  for  applying  benchmark  ratings,  it  has 
been  noted  by  team  members  that  many  of  the  anchor  tasks  on  the  benchmark  scales  are  not 
adequately  defined  In  the  available  procedural  guides.  According  to  the  procedure  for 
benchmarking,  a  task  should  be  evaluated  relative  to  eight  assessment  criteria  (see  Appendix  B). 
However,  some  of  the  anchor  tasks  on  the  benchmark  scale  are  poorly  defined  and  are  not  evaluated 
for  all  eight  criteria.  This  has  resulted  In  the  raters  having  difficulty  In  comparing  tasks  to 
be  rated  against  those  In  the  benchmark  scales.  Therefore,  It  Is  recommended  that  the  definition 
of  the  anchor  tasks  be  refined,  more  specific  evaluations  be  made  on  the  eight  criteria,  and  new 
procedural  guides  developed.  It  may  also  be  necessary  to  evaluate  each  anchor  task  on  the 
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benchmark  scales  to  determine  whether  (a)  the  task  Is  still  performed  In  the  AFS;  (b)  the  task  Is 
still  performed  by  first-term  airmen;  (c)  the  task  Is  representative  In  terms  of  task  type, 
equipment,  system  or  tools  used;  and  (d)  the  task  Is  difficult  to  comprehend  or  Is  worded  In  a 
manner  that  might  confuse  the  rater.  If  It  Is  determined  that  an  anchor  task  does  not  meet  these 
criteria.  It  should  be  replaced  by  a  task  that  does. 


Secondary  Issues 

The  Issues  of  task  variability,  recommended  personnel  to  be  Interviewed,  and  definition  of 
anchor  tasks  are  viewed  as  the  primary  Issues  regarding  the  transfer  of  this  technology.  In 
addition  to  these  Issues,  a  number  of  secondary  issues  need  to  be  adressed. 


How  many  and  which  tasks  should  be  benchmarked? 

In  this  study,  as  well  as  the  previous  learning  difficulty  research,  60  tasks  were  rated  for 
each  specialty.  This  seems  to  be  a  reasonable  number  of  tasks  to  use  for  future  data  collection 
efforts  as  well.  Given  the  fact  that  It  takes  an  average  of  4  hours  to  Interview  personnel  on 
the  full  set  of  60  benchmark  tasks.  Increasing  the  number  of  tasks  to  be  rated  would  1h  tufn 
Increase  the  time  required  to  conduct  Interviews.  In  addition,  confounding  variables  (e.g., 
fatigue,  motivation,  job  requirements,  and  demands)  may  interfere  such  that  tasks  rated  toward 
the  end  of  the  Interview  would  not  be  considered  as  extensively  by  the  airmen  as  the  first,  thus 
yielding  Invalid  results.  Selecting  fewer  tasks  to  benchmark  may  adversely  affect  the  stability 
of  the  learning  difficulty  estimates  for  the  specialty  studied.  Hence,  60  tasks  is  considered 
most  practical  for  use  In  benchmarking.  Tasks  selected  to  be  benchmarked  should  be  based  on  the 
task  selection  criteria  outlined  In  Section  II  of  this  paper. 


What  specialty  data  are  necessary  for  benchmarking? 


Specialty  data  necessary  for  benchmarking  Include  (a)  a  recent  job  Inventory, 
task-time-spent  data  from  Incumbents,  and  (c)  supervisory  task  difficulty  ratings. 


When  should  a  specialty  be  benchmarked? 


The  decision  to  benchmark  a  specialty  is  a  policy  decision  rather  than  a  scientific 
decision.  An  obvious  and  practical  time  Is  whenever  a  new  job  Inventory  Is  developed  reflecting 
major  revisions  from  the  previous  job  Inventory  or  when  major  changes  occur  In  a  specialty  (e.g., 
restructuring) . 


Who  should  collect  benchmark  ratings? 


In  this  study.  Inventory  developers  and  job  analysts  from  OMC  collected  benchmark  task 

learning  difficulty  ratings  for  each  of  nine  specialties.  It  Is  recommended  that  Inventory 

developers  and  job  analysts  be  trained  In  the  use  of  the  benchmark  scales  and  associated 

procedural  guides  and  be  used  In  future  data  collection  efforts  because  they  possess  a  broad 

knowledge  of  Air  Force  enlisted  specialties  and  have  the  expertise  In  routinely  collecting  and 
analyzing  task  factor  data.  The  results  of  this  research  support  the  use  of  these  personnel  for 


collecting  benchmark  ratings  in  future  data  collection  efforts.  The  experience  and  knowledge  of 
these  personnel  contributed  substantially  in  collecting  reliable  and  valid  benchmark  difficulty 
data. 


How  many  raters  should  there  be  per  team? 


In  the  present  study,  two  teams  of  four  raters  each  were  used  to  collect  benchmark  ratings, 
whereas  the  original  contractor  study  utilized  two  teams  of  seven  raters  each.  To  address  the 
issue  of  how  many  raters  per  team  are  needed,  Interrater  reliability  analyses  were  conducted.  In 
performing  this  analysis,  the  median  R^j  (.61)  for  both  QMC  teams  combined  was  used  to 
calculate  the  number  of  persons  required  per  rating  team  to  achieve  an  R^  of  approximately 
.90.  The  results  of  the  analysis  indicated  that  a  team  size  of  five  proved  more  practical  based 
on  an  obtained  of  .89.  Thus,  for  future  data  collection  efforts,  an  N  of  5  members  per 
team  is  recommended. 


How  many  sites  should  be  visited ? 


At  a  minimum,  two  sites  should  be  visited;  one  site  per  team,  per  specialty.  However,  if 
between-slte  differences  are  known  for  a  specialty,  such  as  the  case  with  AFSC  426X2,  Jet  Engine 
Mechanic,  it  would  be  advisable  to  visit  additional  bases  to  assure  a  more  representative 
sampling  of  task  performance.  In  consideration  of  which  sites  to  visit.  Air  National  Guard,  Air 
Force  Reserve,  and  research  sites  should  be  avoided. 


VI.  SUMMARY 

This  study  established  the  feasibility  of  having  Air  Force  personnel  collect  benchmark 
learning  difficulty  ratings  supporting  the  transfer  of  this  technology  from  a  research  to  an 
operational  setting.  It  Is  suggested  that  to  ensure  the  success  and  effectiveness  of  this 
methodology,  attention  be  focused  on  the  Implementation  Issues  addressed  In  this  paper.  In  so 
doing,  the  quality  of  the  collected  data  and  resulting  data  base  will  be  assured. 
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APPENDIX  A:  TRAINING  COURSE  OBJECTIVES 


Table  A-l.  Training  Course  Objectives 


COURSE  TITLE:  Application  of  Benchmark  Rit’ng  Scales  for  Task  Learning  Difficulty 

AIM:  The  aim  of  this  course  Is  to  prepare  occupational  analysts  to  apply.  In  the  field,  each 

of  the  three  benchmark  rating  scales  for  obtaining  task  learning  difficulty  data. 

ELIGIBILITY:  Participants  of  the  course  will  be  Inventory  Developers  and/or  Job  Analysts  from 

the  USAF  Occupational  Measurement  Center  (OMC).  This  implies  that  they  will  have 
familiarity  with  the  occupational  structure  of  the  Air  Force,  and  that  they  are 
experienced  at  interviewing  personnel  and  observing  tasks  In  order  to  generate 
occupational  task  inventories, 

LOCATION:  The  course  venue  is  the  Air  Force  Human  Resources  Laboratory,  Brooks  AFB,  Texas. 

COURSE  TERMINAL 

OBJECTIVES:  Students  are  expected  to  meet  the  following  objectives: 

1.  Objective.  State  the  definition  of  the  task  factor  term  "learning  difficulty." 
Condition.  State  verbally. 

Standard.  Without  error. 

2.  Object  ive.  List  and  define  the  seven  assessment  criteria  for  task  learning 
difficulty. 

Condition.  State  verbally. 

Standard.  Withojt  error. 


3.  Objective.  Assign  benchmark  ratings  to  target  tasks. 

Condition.  Both  In  a  group  situation  and  Independently. 

Standard.  Instructor  satisfaction. 

A.  Objective:  Question  selected  AFS  Incumbents  on  the  assessment  criteria  of  task 
learning. 

Condition:  As  a  group,  question  Individual  Incumbents. 

Standard:  Instructor  satisfaction  that  sufficient  Information  on  the 

assessment  criteria  for  learning  difficulty  Is  obtained. 

5.  Objective:  Use  the  procedural  guide  associated  with  each  benchmark  scale  so  as 
to  become  familiar  with  the  anchor  tasks  which  define  each  point  on  the  scale. 


Condition:  Individually  to  the  satisfaction  of  the  instructor. 

Standard:  To  use  the  procedural  guide  effectively  to  rank  the  Mechanical 

benchmark  anchor  tasks  without  prior  familiarization  (3rd  day  exercise). 


6.  Objective:  Describe  the  procedure  used  to  obtain  benchmark  ratings  of  task 
learning  difficulty. 

Condition:  Describe  verbally. 

Standard:  Without  error. 


7.  Objective:  Select  a  representative  subset  of  target  tasks  for  benchmarking  via 

dTfbenT 

Condition:  Group  situation. 

Standard:  Instructor  satisfaction. 
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APPENDIX  8:  TASK  ASSESSMENT  CRITERIA  DEFINITIONS 


Table  8-1.  Task  Assessment  Criteria  Definitions 


Task  definition:  What  is  the  task?  What  is  and  is  not  included  in  task  performance?  For 
example,  if  the  task  Is  changing  spark  plugs,  must  other  components  (e.g.,  air  filter, 
compressor)  be  removed  first,  or  is  this  a  separate  task? 

The  number  of  work  steps  in  a  task:  Tasks  that  have  many  different  steps  are  obviously 
more  difficult  to  learn  than  those  which  have  only  a  few  steps.  Tasks  that  contain  many 
repetitions  of  the  same  step,  however,  may  be  relatively  easy  to  learn. 

Tools  and  equipment  unique  to  the  task:  The  learning  time  required  for  tools  and 
equipment  unique  to  a  task  adds  to  learning  difficulty. 


Regulations,  manuals  and  standard  operating  procedures:  How  detailed  is  the 
documentation?  The  more  detailed  it  Is,  the  less  has  to  be  learned.  Some  tasks  do  not 
have  to  be  learned,  because  they  can  be  performed  by  simply  following  written  instructions. 

Memorization:  Does  the  task  or  any  portion  of  the  task  have  to  be  memorized  In  order  to 
be  performed?  This  adds  to  the  learning  difficulty. 

Standards  of  performance:  Tasks  differ  In  what  level  of  quality  or  reliability  is 
required  for  ■ sa t 1 s f act ory  performance. *  For  example,  packing  a  parachute  requires  a 
higher  standard  of  product  reliability  than  does  changing  a  faucet  washer.  In  the  latter 
case.  If  the  faucet  leaks,  you  can  do  It  again. 

Time  criticality:  A  task  that  must  be  performed  within  a  time  limit  Is  more  difficult  to 
I earn  than  the  same  task  with  no  time  limit  for  performance. 

Basic  skills  or  knowledge:  For  many  career  fields,  there  are  required  basic  skills  or 
knowledges  (e.g,,  typing,  mathematics).  In  some  cases  these  are  taught  in  the  USAF 
Technical  School.  These  skills  and  knowledges  add  to  the  learning  difficulty  of 
Individual  tasks  only  to  the  extent  that  they  are  used  In  the  performance  of  that  task. 


APPENDIX  C:  LISTING  OF  SPECIALTIES  STUDIED  BY  BASE  AND  TEAM 


Table  C-l.  Listing  of  Specialties  Studied  by  Base  and  Tea* 

TEAM  1:  Klrtland  AFB,  NH 
426X3  •  Turboprop-Propulsion  Mechanic 

431X2  -  Air) Ift/Bombardment  Aircraft  Maintenance 

325X1  -  Avionics  Instrument  Systems 

325X0  -  Automatic  Flight  Control  Systems 
Holloman  AFB,  NM 
426X2  -  Jet  Engine  Mechanic 

431X1  -  Tactical  Aircraft  Maintenance 

316X3  -  Instrumentation 

272X0  -  Air  Traffic  Control 

251X0  -  Weather 

TEAM  2:  Davls-Honthan  AFB,  AZ 

431X1  -  Tactical  Aircraft  Maintenance 

426X2  -  Jet  Engine  Mechanic 

426X3  -  Turboprop-Propulsion  Mechanic 

325X1  -  Avionics  Instrument  Systems 

Barksdale  AFB,  LA 
316X3  -  Instrumentation 

325X0  -  Automatic  Flight  Control  Systems 

431X2  -  Alrllft/Bombardment  Aircraft  Maintenance 

251X0  -  Weather 

Randolph  AFB,  TX 
272X0  -  Air  Traffic  Control 


APPENDIX  D:  COMPARISONS  BETWEEN  RATING  TEAMS 
ON  BENCHMARK  LEARNING  DIFFICULTY 


Table  D-l.  Average  Task  Difficulty  Ratings  for 
Contractor  Tea*  1  and  Tea*  2 


AFSC 

Contractor  Team  1  {N  »  7) 

Contractor  Team  2 

(N  -  7) 

X 

SD 

X 

SD 

251X0* 

10.0 

4.03 

12.5 

3.46 

272X0 

11.2 

3.05 

11.8 

2.34 

316X3 

12.9 

3.04 

13.2 

3.52 

325X0 

13.7 

2.80 

14.1 

3.38 

325X1 

13.4 

2.73 

12.7 

2.74 

426X2 

13.2 

3.15 

13.3 

3.56 

426X3 

12.9 

2.96 

13.6 

3.39 

431X1 

13.0 

2.36 

13.0 

2.64 

431X2 

13.7 

2.09 

13.4 

2.18 

*An 

asterisk 

indicates  a 

significant 

difference  (p  ± 

.05) 

between 

teams 

on  average  ratings  of  difficulty. 

Table  0-2. 

Correlations  Between  Task  Difficulty  Ratings  for 

Contractor  Team 

1  and  Team  2  with  Supervisors 

Contractor 

Team  1 

Contractor 

Team  2 

AFSC 

vs.  Supervisors 

vs. 

Supervisors 

251X0 

.77 

.80 

272X0 

.71 

.72 

316X3 

.74 

.74 

325X0 

.78 

.90 

325X1 

.70 

.77 

426X2 

.67 

.72 

426X3 

.73 

.7* 

431X1 

.84 

.83 

431X2 

.85 

.58 

Note.  All  correlations  are  significant  (p  £.05). 


Table  D-3.  Intercorrelatlons  Among  Ratlngs/Taems-AFSC  2S1XO 


Variable 

1 

2 

3 

4 

5 

6 

7 

1.  OMC  Total  Team 

1.00 

.95 

.96 

.79 

.79 

.69 

.75 

2.  OMC  Team  1 

1.00 

.81 

.76 

.76 

.66 

.74 

3.  OMC  Team  2 

1.00 

.74 

.75 

.65 

.70 

4.  Contractor  Total  Team 

1.00 

.96 

.94 

.83 

5.  Contractor  Team  1 

1. 00 

.79 

.77 

6.  Contractor  Team  2 

1 .00 

.80 

7.  Supervisors 

1.00 

Note.  All  correlations  are  significant  (££.05). 
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Table  0-4.  Intercorrelations  Along  Ratings/Teaes-AFSC  272X0 


Variable 

1 

2 

3 

4 

5 

6 

7 

1.  OMC  Total  Teae 

1.00 

.90 

.89 

.86 

.80 

.79 

.74 

2.  OMC  Teae  1 

1.00 

.62 

.77 

.73 

.70 

.60 

3.  OMC  Teas  2 

1.00 

.77 

.71 

.72 

.74 

♦.  Contractor  Total  Teae 

1.00 

.95 

.91 

.77 

5.  Contractor  Teae  1 

1.00 

.72 

.71 

6.  Contractor  Teae  2 

1.00 

.72 

7.  Supervisors 

1.00 

Note.  All  correlations  are  significant  (P  <  .05). 


Table  0-5.  lntercorrelatlons  Along  Ratings/Teaes-AFSC  316X3 


Variable 

1  2 

3 

4 

5 

6 

7 

1.  OMC  Total  Teae 

1.00  .82 

.88 

.89 

.84 

.66 

.71 

2.  OMC  Teae  1 

1.00 

.45 

.66 

.65 

.83 

.66 

3.  OMC  Teae  2 

1.00 

.66 

.61 

.65 

.56 

4.  Contractor  Total  Teae 

1.00 

.95 

.97 

.77 

5.  Contractor  Teae  1 

1.00 

.84 

.74 

6.  Contractor  Teae  2 

1.00 

.74 

7.  Supervisors 

1.00 

Note.  All  correlations 

are  significant 

(p  .03  . 

Table  0-6. 

tntercorrelat tons  Ae 

Inga/ Team 

-AFSC  326X0 

Variable 

i  2 

J 

4 

5 

6 

7 

1.  OMC  Total  Teae 

1.00  .96 

.9? 

.66 

.79 

.66 

.85 

2.  OMC  Teae  1 

1.00 

.66 

.52 

.74 

.82 

.84 

3.  OMC  Teae  2 

1.00 

.86 

.79 

.84 

.80 

4.  Contractor  Total  Teae 

1  .00 

.94 

.96 

.89 

5.  Contractor  Teae  1 

1.00 

.81 

.78 

6.  Corrector  Teae  2 

1.00 

.90 

7.  Supervisors 

1.00 

Note.  All  correlations 

are  significant 

(p  ^  .05). 

Table  0-7. 

lntercorrelatlons  Along  Ratlngs/Teaes 

-AFSC  325X i 

Variable 

1  2 

3 

4 

5 

6 

7 

1.  OMC  Total  Teae 

1.00  .90 

.96 

.88 

.84 

.82 

.81 

2.  OMC  Teae  1 

1.00 

.74 

.79 

.75 

.74 

.71 

3.  OMC  Teae  2 

1 .00 

.84 

.61 

.78 

.79 

4.  Contractor  Total  Teae 

1 .00 

.94 

.94 

.78 

5.  Contractor  Teae  1 

1.00 

.78 

.70 

6.  Contractor  Teae  2 

1.00 

.77 

7.  Supervisors 

1.00 

Note.  All  correlations 

are  significant 

(p  <  .05)  . 
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Table  0-8.  Intercorrelations  Along  Ratlngs/Teais-AFSC  426X2 


Variable  1  2 

3 

4 

5 

6 

7 

1.  OMC  Total  Team  1.00  .91 

.97 

.91 

.86 

.85 

.82 

2.  OMC  Team  1  1.00 

.78 

.84 

.84 

.77 

.82 

3.  OMC  Team  2 

1.00 

.87 

.82 

.82 

.75 

4.  Contractor  Total  Team 

1.00 

.94 

.96 

.73 

5.  Contractor  Team  1 

1.00 

.80 

.67 

6.  Contractor  Team  2 

1.00 

.72 

7.  Supervisors 

1 .00 

Note.  All  correlations  are  significant 

(_p  <  .05). 

Table  0-9. 

Intercorrelations  Along  Ratlngs/Teais-AFSC  426X3 

Variable 

1  2 

3 

4 

5 

6 

7 

1.  OMC  Total  Team 

1.00  .92 

.97 

.90 

.68 

.86 

.78 

2.  OMC  Team  1 

1 .00 

.80 

.86 

.85 

.80 

.75 

3.  OMC  Team  2 

1.00 

.85 

.82 

.82 

.74 

4.  Contractor  Total  Team 

1.00 

.95 

.96 

.79 

5.  Contractor  Team  1 

1.00 

.84 

.73 

6.  Contractor  Team  2 

1.00 

.78 

7.  Supervisors 

1.00 

Note.  All  correlations 

are  significant 

(_p  <  .05). 

Table  0-10.  Intercorrelatlons  Along  Ratlngs/Teais-AFSC  431X1 


Variable 

1 

2 

3 

4 

5 

6 

7 

1.  OMC  Total  Team 

1.00 

.80 

.89 

.75 

.78 

.65 

.59 

2.  OMC  Team  1 

1. 00 

.43 

.42 

.49 

.32 

.29 

3.  OMC  Team  2 

1.00 

.80 

.80 

.73 

.68 

4.  Contractor  Total  Team 

1.00 

.95 

.96 

.88 

5.  Contractor  Team  1 

1.00 

.81 

.84 

6.  Contractor  Team  2 

1.00 

.83 

7.  Supervisors 

1.00 

Table  0-11.  IntercorreJatlons  Aaong  Rat Ings/Teaas-AFSC  431X2 


Variable 

1 

2 

3 

4 

5 

6 

7 

1.  OMC  Total  Tea* 

1.00 

.87 

.93 

.72 

.66 

.63 

.54 

2.  CMC  Teaa  1 

1.00 

.62 

.61 

.53 

.55 

.36 

3.  OMC  Teaa  2 

1.00 

.68 

.64 

.59 

.59 

4.  Contractor  Total  Teaa 

1.00 

.89 

.90 

.79 

5.  Contractor  Teaa  1 

1.00 

.61 

.85 

6.  Contractor  Teaa  2 

1.00 

.58 

7.  Supervisors 

1.00 

Note.  All  correlations 

were 

significant 

(p  <  .05). 

Table 

D— 12 »  OMC  and  Contractor  Bated  ATDPUTS  for  49  to  96 
Total  Active  Federal  Military  Service  (TAFNS)  Groups 

and 

AFSC 

49-96 

X 

OMC 

TAFMS 

SO 

Total 

7 

TAFMS 

SO 

Contractor 

49-96  TAFMS 

X  SO 

Total 

X 

TAFMS 

SO 

272X0 

136 

8.25 

138 

9.85 

110 

8.59 

113 

11.23 

426X3 

134 

5.95 

134 

7.42 

133 

5.70 

133 

7.11 

316X3 

132 

9.97 

135 

10.59 

124 

10.82 

127 

11.35 

325X0 

129 

8.94 

130 

8.95 

131 

9.38 

132 

9.44 

325X1 

130 

7.28 

131 

7.20 

128 

8.07 

129 

7.99 

426X2 

132 

8.21 

132 

11.05 

148 

6.24 

148 

9.67 

431X1 

123 

6.10 

125 

7.92 

116 

10.66 

119 

13.82 

431X2 

121 

6.94 

123 

8.52 

122 

8.44 

126 

8.96 

251X0 

124 

16.23 

126 

17.79 

101 

20.90 

104 

23.00 

END 
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